System and method for multilanguage text input in a handheld electronic device

ABSTRACT

A system provides multilanguage text input in a handheld electronic device. The system includes one or more applications implemented in the handheld electronic device. The applications include a text input application requiring access to language data usable thereby. One or more language databases contain language data from a plurality of different languages usable by at least one of the applications including the text input application. An interface provides the applications with access to at least some of the different languages of the language data of the one or more language databases, in order that the applications including the text input application receive the different languages.

BACKGROUND OF THE INVENTION

1. Field of the Invention

This invention relates generally to handheld electronic devices, andmore particularly, to a method and system for inputting differentlanguages among one or more applications, such as a text inputapplication, run by the handheld electronic device.

2. Background Information

Handheld electronic devices are becoming ubiquitous. Examples include,for instance, personal data assistants (PDAs), handheld computers,two-way pagers, cellular telephones, text messaging devices, and thelike. Many of these handheld electronic devices incorporate wirelesscommunications, although others are stand-alone devices that do notcommunicate with other devices.

As these handheld electronic devices have become more popular, there hasbeen a growing demand for more functionality and sophistication. Whileit has been common to provide multiple functions, such as an addressbook, spell check and text input, the latter especially has become morecomplex. This is due at least partially to the trend to make thesehandheld electronic devices smaller and lighter in weight. A limitationin making them smaller has been the physical size of the keyboard if thekeys are to be actuated directly by human fingers. Generally, there havebeen two approaches to solving this problem. One is to adapt the tendigit keypad indigenous to mobile phones for text input. This requireseach key to support input of multiple characters. The second approachseeks to shrink the traditional full keyboard, such as the “qwerty”keyboard, by doubling up characters to reduce the number of keys. Inboth cases, the input generated by actuation of a key representingmultiple characters is ambiguous. Various schemes have been devised tointerpret inputs from these multi-character keys. Some schemes requireactuation of the key a specific number of times to identify the desiredcharacter. Others use software to progressively narrow the possiblecombinations of letters that could be intended by a specified sequenceof keystrokes. This approach uses multiple lists that can contain, forinstance, prefixes, generic words, learned words, and the like.

Typically, the various applications have had their own database ordatabases upon which they draw. Thus, the address book application hadits own list of addresses used only for that application, the spellcheck application had its own database of words, and while the textapplication could have multiple lists (e.g., word lists; prefix lists;n-gram lists; learning lists) of a particular single language, thoselists were only used by that text application. This can lead toduplication of data and an inefficient use of memory, which limits theability to reduce the size, weight and energy use of the handheldelectronic device.

The problem of disambiguation of the text input is even larger when theinput might be desired in a number of different languages, such as, forexample, English/French or English/Spanish. Switching between thelanguages to input the words in that language is bulky. Also, the spacerequirements for the device are higher.

There is room for improvement in systems and methods for multilanguagetext input in a handheld electronic device.

SUMMARY OF THE INVENTION

These needs and others are met by the invention, which permitsmultilanguage text input employing linguistic data in a plurality ofdifferent languages using the same script or alphabet (e g., Latin;Cyrillic). This saves space and does not require switching betweendifferent languages during text input.

In accordance with aspects of the invention, one or more applications,including a text input application, in a handheld electric device shareone or more different language databases, thereby reducing the burden onmemory. Thus, for example, the text input application can use one ormore different language databases for multilanguage text input oflanguage data from a plurality of different languages. Generally then,an application can access language data from one, some or all of thedifferent language databases containing language data usable by it.

In accordance with one aspect of the invention, a system formultilanguage text input in a handheld electronic device comprises: atleast one application implemented in the handheld electronic device, theat least one application comprising a text input application requiringaccess to language data usable thereby; at least one language databasecontaining language data from a plurality of different languages usableby at least one of the at least one application including the text inputapplication; and an interface providing the at least one applicationwith access to at least some of the different languages of the languagedata of the at least one language database, in order that the at leastone application including the text input application receives thedifferent languages.

The at least one language database may be a single language databasecontaining blended information from two or more different languages.

The language data may comprise a mixture of a plurality of differentlanguages using the same script or alphabet.

The at least one language database may be a plurality of languagedatabases containing information from a plurality of differentlanguages.

A first one of the different language databases may contain informationfrom a first language of the different languages; and a second one ofthe different language databases may contain information from a secondlanguage of the different languages.

A first one of the different language databases may contain informationfrom a first language of the different languages; and a second one ofthe different language databases may contain information from a secondlanguage and a third language of the different languages.

As another aspect of the invention, a method of multilanguage text inputin a handheld electronic device comprises: implementing at least oneapplication in the handheld electronic device, the at least oneapplication comprising a text input application requiring access tolanguage data usable thereby; employing at least one language databasecontaining language data from a plurality of different languages usableby at least one of the at least one application including the text inputapplication; and interfacing the at least one application with at leastsome of the different languages of the language data of the at least onelanguage database, in order that the at least one application includingthe text input application receives the different languages.

The method may employ as the at least one language database a singlelanguage database including blended information from two or moredifferent languages.

The method may employ as the at least one application the text inputapplication and a spell check application; and include in the differentlanguages of the language data a plurality of words usable by the textinput application and the spell check application, and frequency datafor the words usable only by the text input application.

The method may input text input including the at least some of thedifferent languages of the language data; and seamlessly providepredictive text without regard to the different languages of the textinput.

The method may include with the at least some of the different languagesof the language data a mixture of a plurality of different languagesusing the same script or alphabet.

The method may employ as the at least one language database a pluralityof different language databases including information from a pluralityof different languages.

BRIEF DESCRIPTION OF THE DRAWINGS

A full understanding of the invention can be gained from the followingdescription of the preferred embodiments when read in conjunction withthe accompanying drawings in which:

FIG. 1 is a front view of a handheld electronic device incorporating theinvention.

FIG. 2 is a block diagram illustrating the major components of thehandheld electronic device of FIG. 1.

FIG. 3 is a functional diagram of a data adapter which is one of thecomponents illustrated in FIG. 2.

FIG. 4 is a block diagram illustrating other major components of thehandheld electronic device of FIG. 1 in accordance with an embodiment ofthe invention.

FIG. 5 is a flowchart illustrating a method creating compact linguisticdata.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The invention is disclosed in connection with a reduced keyboard 5 anddisambiguation of text input, although the invention is applicable to awide range of applications for handheld electronic devices.

FIG. 1 illustrates a wireless handheld electronic device 1, which is butone type of handheld electronic device to which the invention can beapplied. The handheld electronic device 1 includes an input device 3 inthe form of a keyboard 5 and a thumbwheel 6 that are used to control thefunctions of a handheld electronic device and to generate text and otherinputs. The keyboard 5 constitutes a compressed “qwerty” keyboard inwhich each of the keys 7 is used to input two or even three letters ofthe alphabet. Thus, initially the input generated by depressing one ofthese keys is ambiguous in that it is undetermined as to which letterwas intended. As discussed previously, various schemes have been devisedfor disambiguating the inputs generated by these keys 7 assignedmultiple letters for input. The particular scheme used is not relevantto the invention. However, text input applications that use software toprogressively narrow the possible combinations of letters that could beintended by a specified sequence of keystrokes use multiple linguisticlists of a particular single language. The inputs provided through thekeyboard 5 and thumbwheel 6 are displayed on a display 9 as is wellknown.

Turning to FIG. 2, the input device 3 provides keystroke inputs to anexecution system 11 that may be an operating system, a java virtualmachine, a run time environment, or the like. The handheld electronicdevice 1 implements a plurality of applications 13. These applicationscan include an address book 15, a text input 17, a translationapplication 19, a spell check application 21 and a number of otherapplications up to an application n 23.

Each of the applications 13 requires access to data needed for thatapplication to run and produce a meaningful output. Such data is storedin a plurality of databases 25. For example, the address bookapplication 15 requires access to addressee names and mailing addressesand/or e-mail addresses or the like that are stored in the addressdatabase 27. The address book application 15 is different from most ofthe other applications 13 in that it only draws information from theaddress database 27 as that is the only location for the specific dataneeded for addressing. Another application that only draws from onedatabase is an auto text application (not shown). An auto textapplication provides full text for abbreviated inputs, such as “bestregards” for “BR” and other shortcut inputs. Such an applicationimproves efficiency by allowing the user to expedite input by onlyproviding a cryptic code for a commonly used word or phrase. Thus, othermore general databases cannot provide useful information to the autotext application.

Some applications 13, such as the text input application 17, utilizemultiple types of linguistic data. The typical disambiguation type oftext input application, for instance, utilizes a generic word liststored in a generic word list database 29. Such text input applicationcan also use a new word list stored in a new word list database 31 and alearning list stored in learning list database 33. Additional lists notshown in FIG. 2 that can be used by the text input application 17 couldinclude a prefix list and an n-gram list. Additional databases 35 (e.g.,without limitation, linguistic for one or more different languages)primarily associated with one or more of the additional applications 23can also be provided.

The text input application 17 in implementing disambiguation displaysthe variants possible at each stage in the sequence of key inputs,ordered according to frequency of use and with whole words first. Thus,the databases primarily associated with or created for use by the textinput application 17 include frequency of use data as part of thelinguistic data. This includes, for instance, the generic word listdatabase 29, the new word list database 31 and the learning listdatabase 33.

Databases primarily for one application can be used by otherapplications. For example, the spell check application 21, which in theexemplary system has no specific databases created especially for it,can utilize data in other databases. Thus, the spell check applicationdraws from the generic word list database 29, the new word list database31 and the learning list database 33. However, spell check does notneed, and therefore does not use, the frequency of use data in thesedatabases. This exemplifies that some databases contain some informationthat can be used, and some that cannot be used, by a particularapplication.

On the other hand, the text input application 17 that utilizes frequencyof use data, can draw on a database, such as the address database 27,that does not provide frequency of use data. As will be explained, afrequency of use can be automatically assigned where it is absent. Notethat the spell check application 21 can also draw on the data stored inthe address database 27. No frequency of use is needed by the spellcheck application 21 and, hence, there is no need to generate such dataas in the case of the text input application 17.

Each of the applications 13 communicates with the databases 25 thatcontains data that the application can use through an interface 37. Inthe case of the address book application 15, which can only utilize datafrom the address database 27, a direct connection 39 provides thisinterface. Such a direct connection, wherein the application can formits request for data and process the responses in a fixed format, iswell know. Applications, such as the text input application 17, that candraw on data in multiple databases 25 require as the interface 37 a dataadapter 41 associated with each such database and a path 43 between thedata adapter and the application. In this arrangement, the applicationformulates a data request that is forwarded over the appropriate path 43to the data adapters 41 associated with the plurality of databases 25containing usable data for the request for data. The data adapter 41obtains the requested data from the associated database and returns itto the application over the appropriate path 43. Hence, the applicationcan receive in response to a single request for data responses frommultiple databases. The application then selects from among theresponses returned by multiple databases, such as by eliminatingduplicate responses and sorting the responses. The latter can includesorting the responses in accordance with frequency of use.

FIG. 3 illustrates the functional organization of the data adapter 41.An interface module 45 receives the request for data from theapplication 13 and passes it to logic 47 that formulates a queryunderstandable by a reader 49 containing the arguments in the datarequest from the application. The reader 49 reads the requested datafrom the associated database 25 and returns it to the logic, which inturn generates a response that is returned to the requesting application13 by the interface module 45. In generating the response, selectedlogic 47 can be applied to the results received from the database. Forinstance, when the requesting application requires frequency of usedata, and the database does not contain this information, the logic canassign a frequency of use. In the exemplary data adapter 41, the logic47 applies a frequency of use in the upper 25% or so of the range offrequencies of use. Other arrangements can be used to assign a frequencyof use where needed. Where frequency of use is assigned or is receivedas part of the results returned by the reader from the database,additional logic, such as sorting according to the frequency of use, canbe applied in generating the response. The response generated by thelogic is then returned to the requesting application by the interfacemodule.

It can be appreciated from the above, through sharing of multipledatabases by multiple applications, the memory resources of a handheldelectronic device can be more efficiently employed, thereby makingpossible a reduction in the size, weight and energy consumption of suchdevices.

The same processing as was discussed above in connection with FIGS. 2and 3 is involved in dealing with a language dictionary, such as thelinguistic database 35. The disclosed method and system allow input fromthe reduced keyboard 5 of FIG. 1 of the characters from differentlanguages, although a full keyboard (not shown) or other suitable inputdevice may be employed. As shown in FIG. 4, the disclosed method andsystem provide multilanguage text input using one or more differentlinguistic databases 51,53,55,57 in the handheld electronic device 1 ofFIG. 1. One or more applications 13, including the text inputapplication 17, are implemented in the handheld electronic device 1 andrequire access to language data usable thereby. Each of theapplications, such as, for example, 17, 21 and 23 of FIG. 4, requiresaccess to different language data 59,61,63,65 usable by thatapplication. The different linguistic databases 51,53,55,57 containrespective different language data 59,61,63,65 from a plurality ofdifferent languages usable by the applications. The interfaces 41 may,thus, provide one or more of the applications 13 with access to one,some or all of the different linguistic databases 51,53,55,57, in orderthat those applications, including the text input application 17,receive at least some of the different languages of the language data ofthe one or more databases.

It will be appreciated that some of the applications 13 may access one,some or all of the different linguistic databases 51,53,55,57 and therespective different language data 59,61,63,65.

The disclosed method and system provide multilanguage text input ofdifferent language data, such as 59,61,63,65, which comprises a mixturebetween two or more different languages (e.g., without limitation,English, French and German) using the same script or alphabet. Here,there are several examples.

EXAMPLE 1

The first example is one linguistic source 51 containing blendedinformation from two or more different languages (e.g., withoutlimitation, English words, French words and German words, along withfrequencies for each of those words).

EXAMPLE 2

The second example is two or more different linguistic sources 53,55containing the respective different linguistic data 61,63 from two ormore different languages. Here, the different linguistic databases 53,55contain information from a plurality of different languages using thesame script or alphabet.

EXAMPLE 3

As a more specific example of Example 2, there may be a first linguisticsource 53 containing information 61 from a first language (e.g., withoutlimitation, English) and a second, different linguistic source 55containing information 63 from a second, different language (e.g.,without limitation, German).

EXAMPLE 4

As another more specific example of Example 2, there may be a firstlinguistic source 53 containing information 61 from a first language(e.g., without limitation, English) and a second, different linguisticsource 57 containing information 65 from two or more second, differentlanguages (e.g., without limitation, French and German).

Linguistic data, such as 61, may be created as is discussed, below, inconnection with Example 5.

EXAMPLE 5

FIG. 5 is a flowchart illustrating a method creating compact linguisticdata. The method uses a word-list containing word frequency informationto produce compact linguistic data, and includes word prefix indexingand statistical character substitution. See, for example, U.S. patentapplication Ser. No. 10/289,656.

The method beings at step 500, where the word-list is read from anoutput file that was produced by a method of word frequency calculation.The words in the word-list are then sorted alphabetically.

The method continues with step 501 of normalizing the absolutefrequencies in the word-list. Each absolute frequency is replaced by arelative frequency. Absolute frequencies are mapped to relativefrequencies by applying a function, which may be specified by a user.Possible functions include a parabolic, Gaussian, hyperbolic or lineardistribution.

The method continues with the step 502 of creating a character-mappingtable. The character-mapping table is used to encode words in asubsequent step. When encoding is performed, the characters in theoriginal words are replaced with the character indexes of thosecharacters in the character-mapping table. Since the size of thealphabet for alphabetical languages is much less than 256, a single byteis enough to store Unicode character data. For example, the Unicodecharacter 0×3600 can be represented as 10 if it is located at index 10in the character-mapping table. The location of a character in thecharacter-mapping table is not significant, and is based on the orderthat characters appear in the given word-list.

The method continues with the step 504 of separating the words in theword-list into groups. Words in each group have a common prefix of agiven length and are sorted by frequency. Words are initially grouped byprefixes that are two characters long. If there are more than 256 wordsthat start with the same two-character prefix, then additionalseparation will be performed with longer prefixes. For example, if theword-list contains 520 words with the prefix “co”, then this group willbe separated into groups with prefixes “com”, “con”, and so on.

The method continues with the step 506 of producing a frequency set foreach group of words. In order to reduce the amount of space required tostore frequency information, only the maximum frequency of words in eachgroup is retained with full precision. The frequency of each other wordis retained as a percentage of the maximum frequency of words in itsgroup. This technique causes some loss of accuracy, but this isacceptable for the purpose of text input prediction, and results in asmaller storage requirement for frequency information.

The method continues with step 508. In order to reduce the amount ofdata required to store the words in the word-list, the charactersequences that occur most frequently in the words are replaced withsubstitution indexes. The substitution of n-grams, which are sequencesof n-number of characters, enables a number of characters to berepresented by a single character. This information is stored in asubstitution table. The substitution table is indexed, so that eachn-gram is mapped to a substitution index. The words can then becompacted by replacing each n-gram with its substitution index in thesubstitution table each time the n-gram appears in a word.

The method continues with step 510 of encoding the word groups into bytesequences using the character-mapping table and the substitution table,as described above. The prefixes used to collect words into groups areremoved from the words themselves. As a result, each word is representedby a byte sequence, which includes all the data required to find theoriginal word, given its prefix.

The method continues with step 511 of creating word definition tables.The word definition tables store the frequency sets calculated at step506 and the encoded words produced at 510.

The method continues with step 512 of creating an offset table. Theoffset table contains byte sequences that represent the groups of words.This table enables the identification of the start of byte sequencesthat represent particular word groups. The offset table is used tolocate the byte sequences that comprise the encoded words for aparticular group that start with a common prefix.

The method concludes with step 514. At this step, the linguistic dataresulting from the method has been stored in the tables that have beencreated. The data tables, including the character-mapping table, thesubstitution table, the offset table and the word definition tables, arestored in an output file.

Statistical data gathered during the method of creating compactlinguistic data may optionally be stored at step 514. The statisticaldata includes the frequency with which n-grams stored in thesubstitution table appear in words in the linguistic data, the number ofwords in the linguistic data, word-list and corpus from which theword-list was generated, and ratios between the numbers of words in thelinguistic data, word-list and corpus.

It will be appreciated that the teachings of Example 5, above, may nowbe applied to different languages (e.g., English; French; German) usingthe same script or alphabet.

Examples 6-8, below, include different applications 13 that employ one,some or all of the different linguistic databases 51,53,55,57 of FIG. 4.These applications function in the same manner, except that for textprediction (Examples 6 and 7), the application 17 requests all the wordsstarting from the various possible prefixes, while for disambiguation,the application 17 requests only the most frequent word for each of thepossible prefixes.

EXAMPLE 6

The text input application 17 includes text prediction using the reducedkeyboard 5 of FIG. 1. At the time of the text input, the system employsone, some or all of the different language data 59,61,63,65 from one,some or all of the different linguistic sources 51,53,55,57 toseamlessly provide predictive text without regard to what language orlanguages the input text belongs.

EXAMPLE 7

Another text input application, such as 23, includes text predictionusing a full keyboard (not shown). Again, at the time of the text input,the system employs one, some or all of the different language data59,61,63,65 from one, some or all of the different linguistic sources51,53,55,57 to seamlessly provide predictive text without regard to whatlanguage or languages the input text belongs.

EXAMPLE 8

The application 21 includes spell checking. The system includes the textinput application 17 and the spell check application 21. The differentlinguistic databases 51,53,55,57 include a plurality of words usable bythe text input application 17 and the spell check application 21, andfrequency data 67,69,71,73 for the words usable only by the text inputapplication 17.

While for clarity of disclosure reference has been made herein to theexemplary display 9 for displaying the variants possible at each stagein the sequence of key inputs as well as other output information fromthe execution system 11, it will be appreciated that such informationmay be stored, printed on hard copy, be computer modified, or becombined with other data. All such processing shall be deemed to fallwithin the terms “display” or “displaying” as employed herein.

While specific embodiments of the invention have been described indetail, it will be appreciated by those skilled in the art that variousmodifications and alternatives to those details could be developed inlight of the overall teachings of the disclosure. Accordingly, theparticular arrangements disclosed are meant to be illustrative only andnot limiting as to the scope of the invention which is to be given thefull breadth of the claims appended and any and all equivalents thereof.

1-20. (canceled)
 21. A system for multilanguage text input in a handheldelectronic device, the system comprising: a multilanguage text inputapplication implemented in the handheld electronic device; a firstlanguage database comprising first language data from a first languageusable by the multilanguage text input application; a second languagedatabase comprising second language data from a second language usableby the multilanguage text input application; and an interfacecommunicating with the multilanguage text input application, theinterface providing the multilanguage text input application, at thetime of multilanguage text input, with the first language data from thefirst language database and the second language data from the secondlanguage database in response to a request for data from themultilanguage text input application to the interface.
 22. The system ofclaim 21, further comprising at least one additional applicationimplemented in the handheld electronic device, the interface furtherproviding the at least one additional application with data from thefirst language database or the second language database.
 23. The systemof claim 21, further comprising a spell check application.
 24. Thesystem of claim 21, wherein the multilanguage text input applicationemploys a reduced keyboard.
 25. The system of claim 21, wherein thefirst language data comprises a mixture of a plurality of differentlanguages using the same script or alphabet.
 26. The system of claim 21,wherein the first language is English and the second language is German.27. The system of claim 21, further comprising a third language databasecomprising third language data from a third language usable by themultilanguage text input application.
 28. The system of claim 27,wherein the first language is English, the second language is French,and the third language is German.
 29. A method of multilanguage textinput in a handheld electronic device, the method comprising:implementing a multilanguage text input application in the handheldelectronic device; employing a first language database comprising firstlanguage data from a first language usable by the multilanguage textinput application; employing a second language database comprisingsecond language data from a second language usable by the multilanguagetext input application; and employing an interface to communicate withthe multilanguage text input application, the interface providing themultilanguage text input application, at the time of multilanguage textinput, with the first language data from the first language database andthe second language data from the second language database in responseto a request for data from the multilanguage text input application tothe interface.
 30. The method of claim 29, further comprising: employingat least one additional application implemented in the handheldelectronic device, the interface further providing the at least oneadditional application with data from the first language database or thesecond language database.
 31. The method of claim 30, furthercomprising: employing as the at least one additional application a textinput application and a spell check application.
 32. The method of claim29, further comprising: inputting the text input from a reducedkeyboard.
 33. The method of claim 29, further comprising: including withthe first language data a mixture of a plurality of different languagesusing the same script or alphabet.
 34. The method of claim 29, whereinthe first language is English and the second language is German.
 35. Themethod of claim 29, further comprising employing a third languagedatabase comprising third language data from a third language usable bythe multilanguage text input application.
 36. The method of claim 35,wherein the first language is English, the second language is French,and the third language is German.
 37. The method of claim 29, furthercomprising selecting an output using frequency data.